AITopics | data environment

Collaborating Authors

data environment

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Decentralized Retrieval Augmented Generation System with Source Reliabilities Secured on Blockchain

Lu, Yining, Tang, Wenyi, Johnson, Max, Jung, Taeho, Jiang, Meng

arXiv.org Artificial IntelligenceNov-12-2025

Existing retrieval-augmented generation (RAG) systems typically use a centralized architecture, causing a high cost of data collection, integration, and management, as well as privacy concerns. There is a great need for a decentralized RAG system that enables foundation models to utilize information directly from data owners who maintain full control over their sources. However, decentralization brings a challenge: the numerous independent data sources vary significantly in reliability, which can diminish retrieval accuracy and response quality. To address this, our decentralized RAG system has a novel reliability scoring mechanism that dynamically evaluates each source based on the quality of responses it contributes to generate and prioritizes high-quality sources during retrieval. To ensure transparency and trust, the scoring process is securely managed through blockchain-based smart contracts, creating verifiable and tamper-proof reliability records without relying on a central authority. We evaluate our decentralized system with two Llama models (3B and 8B) in two simulated environments where six data sources have different levels of reliability. Our system achieves a +10.7\% performance improvement over its centralized counterpart in the real world-like unreliable data environments. Notably, it approaches the upper-bound performance of centralized systems under ideally reliable data environments. The decentralized infrastructure enables secure and trustworthy scoring management, achieving approximately 56\% marginal cost savings through batched update operations. Our code and system are open-sourced at github.com/yining610/Reliable-dRAG.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2511.07577

Country:

North America > United States (0.46)
Europe > Austria (0.28)
Asia > Middle East > UAE (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Local Performance vs. Out-of-Distribution Generalization: An Empirical Analysis of Personalized Federated Learning in Heterogeneous Data Environments

Hussaini, Mortesa, Theiß, Jan, Stein, Anthony

arXiv.org Artificial IntelligenceOct-29-2025

In the context of Federated Learning with heterogeneous data environments, local models tend to converge to their own local model optima during local training steps, deviating from the overall data distributions. Aggregation of these local updates, e.g., with FedAvg, often does not align with the global model optimum (client drift), resulting in an update that is suboptimal for most clients. Personalized Federated Learning approaches address this challenge by exclusively focusing on the average local performances of clients' models on their own data distribution. Generalization to out-of-distribution samples, which is a substantial benefit of FedAvg and represents a significant component of robustness, appears to be inadequately incorporated into the assessment and evaluation processes. This study involves a thorough evaluation of Federated Learning approaches, encompassing both their local performance and their generalization capabilities. Therefore, we examine different stages within a single communication round to enable a more nuanced understanding of the considered metrics. Furthermore, we propose and incorporate a modified approach of FedAvg, designated as Federated Learning with Individualized Updates (FLIU), extending the algorithm by a straightforward individualization step with an adaptive personalization factor. We evaluate and compare the approaches empirically using MNIST and CIFAR-10 under various distributional conditions, including benchmark IID and pathological non-IID, as well as additional novel test environments with Dirichlet distribution specifically developed to stress the algorithms on complex data heterogeneity.

artificial intelligence, generalization, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2510.24503

Genre: Research Report > New Finding (0.93)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Policy-Driven AI in Dataspaces: Taxonomy, Explainability, and Pathways for Compliant Innovation

Chandra, Joydeep, Navneet, Satyam Kumar

arXiv.org Artificial IntelligenceJul-31-2025

As AI-driven dataspaces become integral to data sharing and collaborative analytics, ensuring privacy, performance, and policy compliance presents significant challenges. This paper provides a comprehensive review of privacy-preserving and policy-aware AI techniques, including Federated Learning, Differential Privacy, Trusted Execution Environments, Homomorphic Encryption, and Secure Multi-Party Computation, alongside strategies for aligning AI with regulatory frameworks such as GDPR and the EU AI Act. We propose a novel taxonomy to classify these techniques based on privacy levels, performance impacts, and compliance complexity, offering a clear framework for practitioners and researchers to navigate trade-offs. Key performance metrics -- latency, throughput, cost overhead, model utility, fairness, and explainability -- are analyzed to highlight the multi-dimensional optimization required in dataspaces. The paper identifies critical research gaps, including the lack of standardized privacy-performance KPIs, challenges in explainable AI for federated ecosystems, and semantic policy enforcement amidst regulatory fragmentation. Future directions are outlined, proposing a conceptual framework for policy-driven alignment, automated compliance validation, standardized benchmarking, and integration with European initiatives like GAIA-X, IDS, and Eclipse EDC. By synthesizing technical, ethical, and regulatory perspectives, this work lays the groundwork for developing trustworthy, efficient, and compliant AI systems in dataspaces, fostering innovation in secure and responsible data-driven ecosystems.

data mining, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2507.20014

Country:

Asia (0.67)
North America > United States (0.46)
Europe (0.46)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Law > Statutes (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(4 more...)

Add feedback

Context-Aware Rule Mining Using a Dynamic Transformer-Based Framework

Liu, Jie, Zhang, Yiwei, Sheng, Yuan, Lou, Yujia, Wang, Haige, Yang, Bohuan

arXiv.org Artificial IntelligenceMar-14-2025

This study proposes a dynamic rule data mining algorithm based on an improved Transformer architecture, aiming to improve the accuracy and efficiency of rule mining in a dynamic data environment. With the increase in data volume and complexity, traditional data mining methods are difficult to cope with dynamic data with strong temporal and variable characteristics, so new algorithms are needed to capture the temporal regularity in the data. By improving the Transformer architecture, and introducing a dynamic weight adjustment mechanism and a temporal dependency module, we enable the model to adapt to data changes and mine more accurate rules. Experimental results show that compared with traditional rule mining algorithms, the improved Transformer model has achieved significant improvements in rule mining accuracy, coverage, and stability. The contribution of each module in the algorithm performance is further verified by ablation experiments, proving the importance of temporal dependency and dynamic weight adjustment mechanisms in improving the model effect. In addition, although the improved model has certain challenges in computational efficiency, its advantages in accuracy and coverage enable it to perform well in processing complex dynamic data. Future research will focus on optimizing computational efficiency and combining more deep learning technologies to expand the application scope of the algorithm, especially in practical applications in the fields of finance, medical care, and intelligent recommendation.

algorithm, mechanism, transformer architecture, (13 more...)

arXiv.org Artificial Intelligence

2503.11125

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Structured Reasoning Framework for Unbalanced Data Classification Using Probabilistic Models

Du, Junliang, Dou, Shiyu, Yang, Bohuan, Hu, Jiacheng, An, Tai

arXiv.org Artificial IntelligenceFeb-5-2025

This paper studies a Markov network model for unbalanced data, aiming to solve the problems of classification bias and insufficient minority class recognition ability of traditional machine learning models in environments with uneven class distribution. By constructing joint probability distribution and conditional dependency, the model can achieve global modeling and reasoning optimization of sample categories. The study introduced marginal probability estimation and weighted loss optimization strategies, combined with regularization constraints and structured reasoning methods, effectively improving the generalization ability and robustness of the model. In the experimental stage, a real credit card fraud detection dataset was selected and compared with models such as logistic regression, support vector machine, random forest and XGBoost. The experimental results show that the Markov network performs well in indicators such as weighted accuracy, F1 score, and AUC-ROC, significantly outperforming traditional classification models, demonstrating its strong decision-making ability and applicability in unbalanced data scenarios. Future research can focus on efficient model training, structural optimization, and deep learning integration in large-scale unbalanced data environments and promote its wide application in practical applications such as financial risk control, medical diagnosis, and intelligent monitoring.

artificial intelligence, machine learning, markov network, (17 more...)

arXiv.org Artificial Intelligence

2502.03386

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Switzerland (0.04)

Genre: Research Report > New Finding (0.69)

Industry:

Banking & Finance (1.00)
Health & Medicine (0.89)
Law Enforcement & Public Safety > Fraud (0.69)
Information Technology > Security & Privacy (0.48)

Add feedback

A Collaborative Multi-Agent Approach to Retrieval-Augmented Generation Across Diverse Data

Salve, Aniruddha, Attar, Saba, Deshmukh, Mahesh, Shivpuje, Sayali, Utsab, Arnab Mitra

arXiv.org Artificial IntelligenceDec-8-2024

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by incorporating external, domain-specific data into the generative process. While LLMs are highly capable, they often rely on static, pre-trained datasets, limiting their ability to integrate dynamic or private data. Traditional RAG systems typically use a single-agent architecture to handle query generation, data retrieval, and response synthesis. However, this approach becomes inefficient when dealing with diverse data sources, such as relational databases, document stores, and graph databases, often leading to performance bottlenecks and reduced accuracy. This paper proposes a multi-agent RAG system to address these limitations. Specialized agents, each optimized for a specific data source, handle query generation for relational, NoSQL, and document-based systems. These agents collaborate within a modular framework, with query execution delegated to an environment designed for compatibility across various database types. This distributed approach enhances query efficiency, reduces token overhead, and improves response accuracy by ensuring that each agent focuses on its specialized task. The proposed system is scalable and adaptable, making it ideal for generative AI workflows that require integration with diverse, dynamic, or private data sources. By leveraging specialized agents and a modular execution environment, the system provides an efficient and robust solution for handling complex, heterogeneous data environments in generative AI applications.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.05838

Country:

Asia > India > Maharashtra > Pune (0.05)
Asia > Malaysia > Kuala Lumpur > Kuala Lumpur (0.04)
Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.56)

Add feedback

A Theoretical Framework for AI-driven data quality monitoring in high-volume data environments

Bangad, Nikhil, Jayaram, Vivekananda, Krishnappa, Manjunatha Sughaturu, Banarse, Amey Ram, Bidkar, Darshan Mohan, Nagpal, Akshay, Parlapalli, Vidyasagar

arXiv.org Artificial IntelligenceOct-11-2024

This paper presents a theoretical framework for an AI-driven data quality monitoring system designed to address the challenges of maintaining data quality in high-volume environments. We examine the limitations of traditional methods in managing the scale, velocity, and variety of big data and propose a conceptual approach leveraging advanced machine learning techniques. Our framework outlines a system architecture that incorporates anomaly detection, classification, and predictive analytics for real-time, scalable data quality management. Key components include an intelligent data ingestion layer, adaptive preprocessing mechanisms, context-aware feature extraction, and AI-based quality assessment modules. A continuous learning paradigm is central to our framework, ensuring adaptability to evolving data patterns and quality requirements. We also address implications for scalability, privacy, and integration within existing data ecosystems. While practical results are not provided, it lays a robust theoretical foundation for future research and implementations, advancing data quality management and encouraging the exploration of AI-driven solutions in dynamic environments.

data quality, iaeme, quality assessment, (11 more...)

arXiv.org Artificial Intelligence

2410.08576

Country:

North America > United States > Texas (0.04)
North America > United States > California (0.04)
Oceania > New Zealand > North Island > Waikato (0.04)
(2 more...)

Genre:

Overview (1.00)
Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Education (0.87)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.93)
(2 more...)

Add feedback

Council Post: The Broken Promise Of AI: What Went Wrong Between 2012 And 2022

#artificialintelligenceDec-10-2022, 20:41:14 GMT

Lewis Wynne-Jones is the Vice President of Product at ThinkData Works. In 2012, two things happened that set the tone for the next decade of data investments. One was technological, the other was professional, and they both revolutionized the way we think about data. Together, these events directly led to the emergence of artificial intelligence as a business prerogative. Today, however, AI is fraught with problems, and fewer businesses, not more, are saying they're data-driven.

artificial intelligence, cloud computing, data scientist, (14 more...)

#artificialintelligence

Country: North America > United States > New York > New York County > New York City (0.05)

Industry: Information Technology (0.50)

Technology:

Information Technology > Artificial Intelligence (0.72)
Information Technology > Cloud Computing (0.49)

Add feedback

Why Data Cleaning Is Failing Your ML Models - And What To Do About It

#artificialintelligenceNov-2-2022, 02:11:55 GMT

Precise endeavors must be done to exacting standards in clean environments. Surgeons scrub in, rocket scientists work in clean rooms, and data scientists…well we try our best. We've all heard the platitude, "garbage in, garbage out," so we spend most of our time doing the most tedious part of the job: data cleaning. Unfortunately, no matter how hard we scrub, poor data quality is often too pervasive and invasive for a quick shower. Our research across the data stacks of more than 150 organizations shows an average of 70 impactful data incidents a year for every 1,000 tables in an environment.

data quality, data warehouse, dataset, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Quality > Data Cleaning (0.62)

Add feedback

AI Algorithms Could Rapidly Deploy to the Battlefield Under New Initiative

#artificialintelligenceFeb-12-2022, 02:14:03 GMT

The Pentagon's Joint Artificial Intelligence Center recently started building a joint operating system and integration layer that combatant commands and other military components could eventually use to rapidly make and field artificial intelligence algorithms. This work is one key piece of the center's new Artificial Intelligence and Data Accelerator, or AIDA, JAIC Director Lt. Gen. Michael Groen confirmed this week during the NDIA 2022 Expeditionary Warfare Conference. "AIDA brings us, in small teams, out to the combatant commanders--now, for those of you who have been in combatant commands, or you're familiar with that environment--combat commanders have all of the challenges, all the problems and only some capability, right? And so what we're trying to do from an information advantage perspective is bring them the advantages of good data and good artificial intelligence-generating insights," he explained. Launched last year by Deputy Defense Secretary Kathleen Hicks, AIDA marks a broad initiative to boost data-based decision-making across the military's 11 combatant commands.

ai algorithm, combatant command, new initiative, (7 more...)

#artificialintelligence

Country:

North America > United States (0.39)
Asia > China (0.39)

Industry: Government > Military > Army (0.40)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback